Advanced learning system for detection and prevention of money laundering

ABSTRACT

An automated system for detecting risky entity behavior using an efficient frequent behavior-sorted list is disclosed. From these lists, fingerprints and distance measures can be constructed to enable comparison to known risky entities. The lists also facilitate efficient linking of entities to each other, such that risk information propagates through entity associations. These behavior sorted lists, in combination with other profiling techniques, which efficiently summarize information about the entity within a data store, can be used to create threat scores. These threat scores may be applied within the context of anti-money laundering (AML) and retail banking fraud detection systems. A particular instantiation of these scores elaborated here is the AML Threat Score, which is trained to identify behavior for a banking customer that is suspicious and indicates high likelihood of money laundering activity.

TECHNICAL FIELD

The subject matter described herein relates to computer-based machinelearning systems and methods, and more particularly to advanced learningsystems and methods for detection and prevention of money laundering.

BACKGROUND

Money laundering is a term given to a process of taking funds from anillicit activity and manipulating them through the financial system suchthat they appear to be from a different and legitimate source. Moneylaundering is a complex, worldwide problem, with estimates of $800B to$2T USD being laundered every year. Laundering typically includes threesteps: (1) placement, where the illicit funds are first introduced tothe financial system; (2) layering, where the illicit funds are combinedthrough multiple transactions with legitimate sources; and (3)integration, where the illicit funds are returned to the laundererthrough seemingly legitimate transactions.

Money laundering occurs through a wide variety of financial products andaccess channels, including current accounts (EFT/ACH/SWIFT, wire, check,cash), loans, investment products, credit cards (purchases, returns,over payments) and debit cards (traditional and pre-paid). A recentproliferation of technologies, from mobile payments to cryptocurrencies,has increased the difficulty of finding a comprehensive solution.

Traditional approaches to create anti-money laundering (AML) solutionshave focused on rules-based systems to meet specific regulatoryrequirements. For example, the US Bank Secrecy Act of 1970 requiredenhanced reporting of transactions exceeding $10,000. However, basicrules like these were easily circumvented by breaking up largetransactions into smaller amounts that would avoid triggering suchrules.

Modern detection systems, such as fraud detection systems, rely ontraining data for supervised or semi-supervised machine learning methodsto improve analysis of activities or transactions to detect targetedbehaviors or actions. The better the training data, the more accurateand efficient the detection system. Some regulations require thatSuspicious Activity Reports (SARs) be filed in cases with sufficientsuspicion of wrong-doing. SARs are quite rare, so one of the mainchallenges for supervised learning is the highly unbalanced nature ofthe classes. The incorporation of certain semi-supervised techniques canhelp address this issue. While traditional AML systems create ahigh-volume of alerts, only a small fraction of these alerts will beinvestigated and lead to a SAR filing with regulators. Accordingly, whatis needed is a system and method to prioritize alerts, to determinethose cases with the highest likelihood of laundering activity, and tolink alerts with SARs to develop a robust source of training data forsupervised or semi-supervised machine learning methods and systems fordetection of money laundering.

SUMMARY

This document describes an automated system for detecting risky entitybehavior using an efficient frequent behavior-sorted list. From theselists, fingerprints and distance measures can be constructed to enablecomparison to known risky entities. The lists also facilitate efficientlinking of entities to each other, such that risk information propagatesthrough entity associations. These behavior sorted lists, in combinationwith other profiling techniques, which efficiently summarize informationabout the entity within a data store, can be used to create threatscores. These threat scores may be applied within the context ofanti-money laundering (AML) and retail banking fraud detection systems.A particular instantiation of these scores elaborated herein is the AMLThreat Score, which is trained to identify behavior for a bankingcustomer that is suspicious and indicates high likelihood of moneylaundering activity

In one aspect methods having one or more operations, non-transitorycomputer program products storing instructions that, when executed by atleast one programmable processor, cause the at least one programmableprocessor to perform operations, and systems comprising at least oneprogrammable processor and a machine-readable medium storinginstructions that, when executed by the at least one processor, causethe at least one programmable processor to perform operations isdescribed.

The operations can include creating one or more profiles for an entityof interest. Each profile can be formed as a data structure thatcaptures statistics of the entity's behavior without storing a record ofpast activity of the entity. Each profile can comprise a plurality ofbehavior sorted lists and recursive features. Each behavior sorted listcan be formed of a tuple of entries. Each entry can be a key, a weight,a payload that represent a frequently-observed behavior of the entity,or the like. The recursive features can be configured to summarize thefrequently-observed behavior of each profile.

The operations can include storing the one or more profiles in a datastore. For each input data record associated with the entity of interestone or more relevant profiles can be retrieved from the data store. Theone or more relevant profiles can be updated to recursively computesummary statistics of behavior of the entity by adding or updating anobserved behavior represented by the input data record to at least oneof the plurality of behavior sorted lists with a full weight whiledecaying the weights of existing observed behaviors.

The elements can be compared between at least two of the plurality ofbehavior sorted lists to generate a numerical value representing aconsistency between entries in the behavior sorted list of the entityand that of other entities including risky entities and those associatedbehavior sorted list entries.

One or more distance models, to the plurality of behavior sorted lists,can be executed to determine a variation of the consistency between theentries in the at least two behavior sorted lists according to thenumerical value.

An anti-money laundering threat score can be generated. The anti-moneylaundering threat score can be generated utilizing self-calibratingoutlier models based on entity recursively summarized profile behavior,recurrences in an entities behavior sorted list. The anti-moneylaundering threat score can be based on the variation of matches onbehavior sorted lists of risk entities. The anti-money laundering threatscore can represent a threat risk that the entity of interest is engagedin money laundering.

In some variations, the payload of at least one entry of the one or moreprofiles includes recursive features. The payload of at least one entryof the one or more profiles can include archetype distributions, derivedarchetype profile features, soft clustering misalignment scores, and/orthe like.

The input data record is a transaction performed by the entity ofinterest.

Storing the one or more profiles in a data store can comprise storingthe one or more profiles as an account on a server that is part of acloud-based network of servers.

In some variations, two or more accounts can be linked from thecloud-based network of servers. The risk level can be associated withthe linkage of accounts and can reflect using the threat scores acrossthe linked list of accounts. A degradation of a set of anti-moneylaundering threat scores can be determined. One or more outlierdetection models can be retrained based on the degradation and/or usingone or more auto-retraining mechanisms.

In some variations, a set of global profiles can be generated. The setof global profiles can represent a population of entities of interest.

Implementations of the current subject matter can include, but are notlimited to, systems and methods consistent with one or more featuresdescribed herein, as well as articles that comprise a tangibly embodiedmachine-readable medium operable to cause one or more machines (e.g.,computers, etc.) to result in operations described herein. Similarly,computer systems are also described that may include one or moreprocessors and one or more memories coupled to the one or moreprocessors. A memory, which can include a computer-readable storagemedium, may include, encode, store, or the like one or more programsthat cause one or more processors to perform one or more of theoperations described herein. Computer implemented methods consistentwith one or more implementations of the current subject matter can beimplemented by one or more data processors residing in a singlecomputing system or multiple computing systems. Such multiple computingsystems can be connected and can exchange data and/or commands or otherinstructions or the like via one or more connections, including but notlimited to a connection over a network (e.g. the Internet, a wirelesswide area network, a local area network, a wide area network, a wirednetwork, or the like), via a direct connection between one or more ofthe multiple computing systems, etc.

The details of one or more variations of the subject matter describedherein are set forth in the accompanying drawings and the descriptionbelow. Other features and advantages of the subject matter describedherein will be apparent from the description and drawings, and from theclaims. While certain features of the currently disclosed subject matterare described for illustrative purposes in relation to an enterpriseresource software system or other business software solution orarchitecture, it should be readily understood that such features are notintended to be limiting. The claims that follow this disclosure areintended to define the scope of the protected subject matter.

DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, show certain aspects of the subject matterdisclosed herein and, together with the description, help explain someof the principles associated with the disclosed implementations. In thedrawings,

FIG. 1 shows the components of the feature construction module and theirinteractions with the profile stores;

FIG. 2 illustrates operation of a behavior sorted list;

FIG. 3 shows an overview for the AML Threat Score system at run-time;

FIG. 4 illustrates an example of efficient account risk-linking; and

FIG. 5 illustrates an auto-retraining module.

When practical, similar reference numbers denote similar structures,features, or elements.

DETAILED DESCRIPTION

To address these and potentially other issues with currently availablesolutions, methods, systems, articles of manufacture, and the likeconsistent with one or more implementations of the current subjectmatter can, among other possible advantages, provide a set of modulesfor learning entity behavior, and which can combined into an AML ThreatScore system. In some implementations, a system captures many aspects ofcustomer and account behavior in order to alert financial institutionsto activity that is similar to observed money laundering activity, orwhich is suspiciously anomalous for that entity based on its priorhistory and peer groups.

The AML Threat Score (or “score”) makes use of profiles to efficientlysummarize past behavior in a data store, while minimizing the storageand lookup requirements on that data store. The score also makes use ofself-calibrating technologies to adapt to dynamic real-world conditions,e.g., macro-economic factors such as currency fluctuation. The score canalso make use of soft-clustering misalignment (SCM) technologies such ascollaborative profiling and recurrent sequence modeling which comparebehavior across clusters and through sequences, as described inco-pending U.S. patent application Ser. No. 15/074,856, filed Mar. 18,2016, and entitled “Behavioral Misalignment Detection within Entity HardSegmentation utilizing Archetype-Clustering,” the contents of which areincorporated by reference herein for all purposes.

The profiles include the efficient favorite behavior-sorted lists(BList) technologies, which efficiently capture favorite behaviorassociated with an entity. Data stored within a BList payload includerisk-linking features, which allow high-scoring risky entities toinfluence the scores of other associated entities in near real-time,without computing costly graph-based analytics.

The AML Threat Score can be used in conjunction with rule-based AMLsystems to prioritize rules-triggered cases, in order to reduce falsepositives. The AML Threat Score can also be used to directly createcases that have high likelihood of laundering, in order to capture moresuspicious activity which is missed by current rules-based systems.

Cloud-based consortium instantiations of the systems and methodsdescribed herein help link risky behavior across multiple banks andother institutions, giving a more global view of emerging threats. Inparticular, new payment channels such as digital currencies (BitCoin,etc.) have particular attraction to money launderers, and cloud-basedimplementations of the AML Threat Score system address this bymonitoring points where digital currencies interact with traditionalbanking networks and channels.

In some preferred exemplary implementations, for each entity of interest(e.g., customers or accounts), a profile is created. The profile is adata structure which captures statistics of the entity's behavior in anefficient way, without storing the entire record of the past activity.The profile is persisted into a data store, and is retrieved and updatedfor each input data record (e.g., transaction or non-monetary update).One key to the profiling concept is that the profile does not contain aset of previous records, instead, it uses recursive functions such asexponential decays to compute summary statistics of behavior.

Expected vs actual (long-term vs short-term) behavior is often suggestedas an important indicator of money laundering activity. To implementthis, values within the profile store long-term and short-term averagesof transaction amounts, frequencies of transaction events, and relatedquantities. This process of combining the input data record with valuessaved in the profile through mathematical transformations is known“feature construction”. Feature construction include time- andevent-based averages, e.g., implemented with recursive approximationusing exponential decays. Feature construction can also include othertypes of calculations, such as self-calibrating features (see below),which rely on the use of population properties, which are stored in“global profiles”. These global profiles can represent the entirepopulation, or smaller segments such as peer groups.

Another type of feature construction is based on soft-clusteringtechnologies, such as collaborative profiling and recurrent sequencemodeling, which also makes use of population or segment propertiesstored in the global profile. These techniques are designed to extractfeatures of typical behavior, including archetypes and sequence models,which can then be clustered to form estimates of the typicaldistributions of normal behavior. Clusters are typically created withinthe entity segmentations used in a domain (e.g., a bank's segmentationof customer based on income, business type, geographic area, KYC threatsegments, etc.). Behavior which deviates from typical clusters can bedetected by constructing a soft-clustering misalignment (SCM) score fromthe numerical soft-clustering features. The SCM score and attributesused in the creation of the score can be one of the many featurescreated within the profile-based feature construction module for the AMLThreat Score.

The output of the feature construction module is a numerical vector offeatures, x_(t), which is fed into the scoring and calibration module(see below).

FIG. 1 shows the components of the feature construction module and theirinteractions with the profile stores. FIG. 1 illustrates an example ofinteraction between the profile and feature construction module. Inputdata d_(t) from an entity A (e.g., customer or account) is presented tothe feature construction module, where it is used to create recursivefeatures, self-calibrating features and soft-clustering features. Theoutput of feature construction is a vector x_(t) which contains all suchfeatures which will be used in later stages of the score creation.Soft-clustering features can be created from methods includingcollaborative profiling (CP) and recurrent sequence modeling (RSM).Soft-clustering features can be combined to create a soft-clusteringmisalignment (SCM) score, which indicates if behavior is deviating fromthe observed typical behavior within clusters found within a segment.

Behavior Sorted Lists

Frequent behavior sorted lists (“BLists”) are data structures stored inan entity's profile, and are used to track information in aspace-efficient manner. The BList is space-limited, and uses a weightingmechanism to preserve favorite entries in the lists. The weightingmechanism is tuned so that frequently seen favorites are preserved inthe list (gated on recency) such that newly seen items can be added, butnot at the expensive of deleting long-term often-seen items.

BLists can also store a payload for each entry, which can includesimilar types of recursive features (see above, Profiling). For anentity A, the BList is a tuple (ordered list) of elements, (a₁, a₂, . .. , a_(n)), where n is the maximum allowed size of the BList. Each BListentry a_(i)=(k, w, p) contains a key k, weight w, and payload p (whichtypically includes the entry date of the element in the BList). Theconcept of “favorite” is formalized to mean an entry in the BList whichhas been on the list for a certain period of time (e.g., entry date>2weeks in past) and has a certain rank in the list (e.g., in the top halfof the list, i<n/2). FIG. 2 show basic BList operation.

FIG. 2: Operation of a BList. At each time step, an observation isconsidered for addition to the list. At t₁ . . . t₃, a new observationis added to the list, while the weight of existing entries is decayed.New entries are added with weight w=1, and the payload is constructedfrom the entry date. At t₄, the observation has a key=1 that has beenseen before. In this case, the weight is updated so that key=1 is now atthe top of the list, in the position of “top favorite”.

Numerical features can be constructed from the BList. Features can rangefrom basic binary features (e.g., “is element of BList”, “is not elementof BList”, “is favorite element of BList”, “is top favorite element ofBList”) to more complex features (“average weekly rate of missing anelement in the BList”). BList “churn” measures how frequently entriesdrop off the list. For certain entities (e.g., a common entity like anelectric utility account), their BLists of associated payer accountswill have higher churn, while a less common entity may have little or nochurn.

BList “fingerprinting” can be done by comparing the elements in oneBList to those in another, giving a numerical value to the amount ofconsistency between the entries in both lists. The elements in a BListcan be represented as a tuple, and distance metrics can be applied tocompare the tuples. To compare BList for an entity A=(a₁, a₂, . . . ,a_(n)), with BList for entity B=(b₁, b₂, . . . , b_(n)), a number ofdistance metrics can be used, some examples include:

Set similarity distances such as Jaccard distance can be used to compareA and B based on commonality of entries between the two lists. Howeverset similarity metrics do not take into account the ordering of theBList.

Edit distances such as Levenshtein distance can be used, by consideredeach key k as a letter of a string, and finding the number of editsrequired to transform the list of A's keys into the list of B's keys,where each edit may be an insertion, deletion or substitution of a key.

Efficient metrics can be computed by taking advantage of key-indexing ina datastore system. Permutations of the BList keys (tuples of variouslengths) can be constructed, such as (k_(a) ₁ , k_(a) ₂ , k_(a) ₃ ),(k_(a) ₁ , k_(a) ₃ , k_(a) ₂ ), (k_(a) ₂ , k_(a) ₁ , k_(a) ₃ ), (k_(a) ₃, k_(a) ₁ , k_(a) ₂ ), etc. These permutations are each stored asindividual keys in the data store. A match of the top 3 keys (k_(b) ₁ ,k_(b) ₂ , k_(b) ₃ ) from BList B against these keys would indicatesimilarity with the BLists stored in data store. This provides quick(with computational complexity order O(1)) method of determining if anentity is showing similar favorite behavior to another entity.Typically, the data store would store a small set of the suspicious/SARaccount BList entries, and so it becomes an efficient way of determiningsimilarity to risky entities. Permutations generally contain a smallnumber of keys, from two to five.

We refer to these metrics as “BList fingerprint distances”, where thechoice of distance metric is informed by the specifics of theapplication, including the key type and efficiency requirements. Thepermutation of keys BList fingerprint method described above is fastenough to be performed for real-time decisioning, and essentially makesa space vs. time tradeoff. The set similarity and edit distance BListfingerprints described above can involve substantial computationaleffort, such that they can be performed in a batch process (e.g. daily)against only a small set of accounts, e.g. those with a known SAR orfraudulent entity.

In AML applications, multiple BLists can be created for each entity, andused to store credited and debited accounts, countries, and usage ofaccess channels and processing channels. Additional numerical featurescan be constructed on the payload values, e.g., a BList on accounts witha payload that stores the average payment amount. In AML and retailbanking applications, BList features help detect shifts in payment andtransfer behavior, such as suddenly paying a large number ofnever-associated accounts.

The BList fingerprint distance can be used a number of ways for AML andfraud prevention. If two accounts, A and B, have very similar BListfingerprints, and A was known to have suspicious activity, and account Bbegan to behave like the first, then we could conclude that B is morelikely suspicious and should be investigated. If we preserve account A'sprofile and BLists, then we can compare new accounts against it, evenafter account A has been blocked or closed.

Anomaly Detection Using Self-Calibrating Outlier Technology

The numerical features created in the feature construction module mayundergo changes in distribution over the time while the system isdeployed. Changes in distribution can be due to changes in currencyvaluation, macroeconomic factors, target customers for the financialinstitution, and changes in risk patterns due to differing schemes formoney laundering. Since these distributional changes cannot be known atdesign-time, the system applies self-calibrating outlier detectionmethods to those features that are deemed susceptible to significantchanges.

Single-variate outlier detection is done by run-time estimation ofparameters of the distribution, such as quantiles, usingself-calibrating algorithms. These distribution parameters are stored ina global profile in the data store, which is common for all entities (orall entities within a peer grouping). Once a set of quantiles (θ_(h),θ_(l), e.g., 90^(th)% and 99^(th)%), have been estimated, an outlierfeature q is found from a new observation x_(i):

${q\left( x_{i} \right)} = {{\min\left( {{\max\left( {\frac{x_{i} - \theta_{s}}{\theta_{h} - \theta_{l}},0} \right)},C} \right)} \in \left\lbrack {0,C} \right\rbrack}$

Where θ_(s) is a design parameter, often set to either q_(h) or q_(l),and C is used to limit the outlier feature to a certain range.

Creation of the AML Threat Score

In some implementations, a software system creates an AML Threat Scorebased on advanced learning algorithms, including the modules describedin the previous sections. Two major sources of data are fused togetherto create an AML Threat Score at the customer and account levels:transactional and non-monetary information. Transactional data includemonetary transfers, payments, debits, etc., and non-monetary dataincludes customer applications, customer updates, demographic, accountupdate, beneficiary changes, and on-line banking information (page view,login, etc.). FIG. 3 shows an overview of the AML Threat Score system.

FIG. 3: Overview for the AML Threat Score system (run-time). Input datafrom customer systems and transactions from banking channels are sent instreaming or batch-mode to the feature construction module. The featureconstruction module converts the input data (of multiple types, such ascategorical, ordinal, text), and convert them into numerical features,which are stored and retrieved from profile data stores. Relevantentities to be profiled for AML include customers and accounts. Adistinction is made between customers and accounts who have directrelationships with the financial institution (on-us) and those who donot (off-us). Once features are constructed, this numerical vector isconverted to the AML Alert Score using the scoring and calibrationmodule.

Regulations require that institutions track details of their customers,which are referred to as Know Your Customer (KYC) rules. KYC informationincludes:

-   -   a) Identity confirmation, including personal and business        information on the nominal and beneficial owners of an account.    -   b) Risky countries    -   c) Political exposed persons    -   d) Watch-lists    -   e) Enhanced Due Diligence (EDD) in cases of suspected higher        risk, which includes source of wealth/funds, nature of business,        the customers of that business, references, etc. Some of this        information may be in unstructured plain text.

Traditional processes include vetting this information duringon-boarding. The presented system uses on-boarding information, as wellas any updates to the KYC information, presented over the lifetime ofthe customer's relationship with the institution.

For each data record (monetary transaction or non-monetary record), thescoring and calibration module combines the features from the featureconstruction module and generates an AML Threat Score, which is aninteger in [1,999], where low values indicate low probability that theentity state and current record points to money laundering activity, andwhere high values of the score indicate high probably of launderingactivity, as well as similarity with behavior of known past SAR cases.

Financial institutions are required to file SARs, and details from SARcases are available to the scoring module as a supervised trainingsignal for machine learning algorithms. Supervised learning algorithmsused by the scoring module can include one or more of: linearregression, logistic regression, multi-layer perceptron neural networks,scorecards, support-vector machines, adaptive models, and decisiontrees.

The AML Threat Score system (FIG. 3) is typically run with data from asingle institution, and a distinction is made between customers andaccounts held at that institution (on-us) and those held at otherinstitutions (off-us). The number of off-us customer and accountprofiles can get very large, in fact, covering all the customers andaccounts at institutions worldwide. Concise profiling technology is usedto maintain the profiles of the most relevant off-us entities within thelimits of the storage capacity of the AML solution. The most relevantentities are preserved based on a combination of factors: frequency ofactivity, recent high AML Threat Scores, monetary value of transactions,watch list status, etc. In FIG. 3, the customer and account on-usprofiles are sized to store all entities with relationships with theinstitution, while the off-us profiles are sized based on the trade-offbetween storage cost and prediction accuracy.

The AML Threat Score system can interface with other AML solutions whichproduce rule-based alerts. Because some of these alerts did not lead toSARs being filed (false-alarms), this additional information can be used(during design-time) as supervised training signals, with the goal oflowering scores of records which are similar to those false-alarmrule-based alerts.

Because a considerable amount of human effort is involved ininvestigating a case for potential SAR filing, it is important for theAML Threat System to produce scores which are stable over time, and tohave score-bands which represent a consistent likelihood of risk overtime. To achieve this, design-time and run-time calibration steps areused to adjust the output of the supervised learning algorithm toproduce the final AML Threat Score. Such calibration methods can includeon-line distribution estimates. The AML Threat Score can be designed tosupplement rules cases by prioritizing those rules-generated cases towork first. Additionally, when no AML rules have fired, the AML ThreatScore can be used independently to generate SAR investigations and workcases.

Efficient Account Risk-Linking

Layering of funds is typically an integral part of the launderingprocess. If an account interacts with other accounts known or suspectedof laundering, this increases the risk of illicit activity, and thataccount should be flagged as suspicious. While tree-based link analysiscan be performed, we propose a fast and efficient way of linkingaccounts through behavior-sorted lists within profiles.

Building on the account profiling, BLists and AML Threat Score discussedabove, each account profile stores 4 relevant items, which comprise the“risk-linking” features:

-   -   a) HighestScoreRecent—Account's highest score (decayed over        time).    -   b) HighestLinkedScoreRecent— The highest-scoring of any        associated account (decayed over time). This allows linking of        risk across multiple accounts (not just accounts that transact        directly).    -   c) BList: Credited Accounts—List of accounts that this account        has transferred funds to (including their high-scores).    -   d) BList: Debited Accounts—List of accounts that this account        has received funds from (including their high-scores).

FIG. 4 shows an example of these elements for a set of transactions. Thekey processes in the risk-linking are (1) the storing of the score inthe profiles of both parties to a transactions, and (2) in the scoringand calibration module (see FIG. 3), using the risk-linking features tocreate a “risk-linked score”, which is merged from the raw score. Thefeature creation module (see FIG. 3) is responsible for decaying therisk-linked features (scores), and a number of decay strategies arepossible:

-   -   a) Event-averaged decay, where the scores are decayed based on        the number of events that have occurred since the high-scoring        event.    -   b) Time-based decay, where the scores are decayed based on the        time between the current event and the earlier high-scoring        event.    -   c) Time-to-live decay, where the scores are preserved        (unaltered) for a certain number time-period.

FIG. 4: Example of efficient account risk-linking. In this example,account 1 makes a risky transfer of funds to account 2, which scores900. That high score is stored in account 1's profile as well as inaccount 2's profile in “BList: Debited Accounts”. The profile of account2 is also updated with HighestLinkedScoreRecent. Note that this firsttransaction has the same value for the raw score as the risk-linkedscore, because no prior risk-linking information was available. Then att₂, account 2 transfers funds to account 3.

While the transaction from account 2 to account 3 would appear lessrisky in isolation (having a raw score of 300), this transaction andAccount 3 can be informed of the linked risk, through 3 features inAccount 2 (HighestScoreRecent, “BList: Debited Accounts” andHighestLinkedScoreRecent). The arrows show the risk-linking from account1 to accounts 2 and 3. The risk-linked score is created by merging theraw score with the scores from the risk-linking features.

Autonomous Retraining of AML Threat Scores

As entities are investigated based on their AML Threat Scores, new SARsare created. The transactions associated with these scores can berapidly integrated back into the model to improve detection of similarcases. This section describes the feedback loop which autonomouslyretrains the AML Threat Score at a faster cadence than possible withmanual retraining.

The auto-retraining module is important to optimize human investigationtime to prevent having to work cases that are very similar to thosealready deemed false positives and essential when updating BList tupletables of risky fingerprints associated with known or suspected SARcases.

The auto-retraining module (FIG. 5) is run periodically to update themodel design parameters (including network weights, calibration andsoft-clustering parameters), and BList fingerprint tables, e.g., on aweekly or monthly cadence. To keep training times short, the parametersupdated are limited to those in the scoring and calibration module, and,in certain variations of the system, the collaborative profilearchetypes and associated clusterings. To keep the data set sizedmanageable, the number of non-SAR cases is downsampled. The scoringmechanism retrained by the auto-retraining module can include one ormore of: linear regression, logistic regression, multi-layer perceptronneural networks, scorecards, support-vector machines, adaptive models,and decision trees.

A key component of the auto-retraining module is the quality assuranceprocess (diamond box in FIG. 5), which determines if the automaticallylearned model parameters are acceptable, or if it is safer and moreaccurate to use the previously learned model. Generally, the newlytrained model may be considered a “challenger” to the previously learned“champion” model, and the challenger model would need to perform betteron a suitable validation data set (e.g., from accounts that had not beenused in either retraining process, or that came from a different periodof time). Multiple challenger models can be created under varyingtraining conditions, such as different regularization parameters andmodel architectures. For each specific algorithm, other quality checksare performed. For logistic regression or neural network models, thenumerical stability of the solution can be evaluated, and can berejected if the coefficient values have, e.g., high variance (indicatingthe model is overly sensitive to certain input features).

FIG. 5 illustrates an auto-retraining module for automatically updatingthe scoring and calibration module during system operation (run-time).Periodically, a signal initiates the retraining process. SAR alerts andcases covering a recent time-period are retrieved from a data store, andselected based on applicability for use in retraining. The relevantfeatures associated with those cases are retrieved from the feature datastore.

Cloud Based Consortium Model for AML

To fully address the problem of money laundering, the flow of funds mustbe tracked across multiple financial institutions. The AML Threat Scoresolution can be deployed in the cloud, which is a set of servers. Thisallows the predictive model to have access to SAR information and otherrisk factors from a variety of institutions that are members of the AMLConsortium. The predictive model can therefore have access to a widerview of risk information than would be possible when viewing onlytransactions and customer information from a single institution. Forexample, in the cloud-based consortium, the efficient-risk linking (seeabove) can propagate risk information from accounts at multiple banks,which is not possible for an on-premises deployment where the profilesonly have a view into transactions where one of the counter-parties is acustomer of that institution. When SARs are submitted to the consortium,the associated BList fingerprints are also contributed, such that AMLThreat Scores for all associated institutions can be informed ofdetected suspicious activity.

Because emerging payment systems such as mobile and cryptocurrencies mayhave limited interaction with traditional financial institutions, thereare more limited opportunities to detect laundering which involves them.

To improve detection, a Cloud based data store integrates informationfrom multiple sources, including:

-   -   a) Entities associated with legal and illicit BitCoin exchanges    -   b) Entities associated with mobile payment and remittance        networks

KYC on-boarding and customer updates can include questions on suchtopics, as well as from public records and regulators. While it may bedifficult to get timely information on illicit Bitcoin sources, it isimportant to collect and centralize information on legal exchanges andadministrators (miners, etc.). Regulations now require that certainentities associated with BitCoin be classified as “money servicesbusinesses” (MSBs) and must comply with appropriate regulations, such asthe Bank Secrecy Act. In particular, if an entity exchanges virtualcurrencies for real currencies, or acts as an intermediary transferringvirtual currency, the entity is treated as a MSB. In some cases BitCoinminers may not be treated as MSBs, e.g., if they only mine BitCoin anduse it for personal purchases (without exchange to real currency). Theinformation required for compliance with these regulations is animportant part of the data specification, which allows the profiles,BLists, and other features of the system to properly evaluateBitCoin-related accounts and customers. Having information on legalBitcoin operators helps the AML Threat Score learn their behavior, anddetect changes in their behavior that may signal new illicit use(without explicit knowledge of the operator).

One or more aspects or features of the subject matter described hereincan be realized in digital electronic circuitry, integrated circuitry,specially designed application specific integrated circuits (ASICs),field programmable gate arrays (FPGAs) computer hardware, firmware,software, and/or combinations thereof. These various aspects or featurescan include implementation in one or more computer programs that areexecutable and/or interpretable on a programmable system including atleast one programmable processor, which can be special or generalpurpose, coupled to receive data and instructions from, and to transmitdata and instructions to, a storage system, at least one input device,and at least one output device. The programmable system or computingsystem may include clients and servers. A client and server aregenerally remote from each other and typically interact through acommunication network. The relationship of client and server arises byvirtue of computer programs running on the respective computers andhaving a client-server relationship to each other.

These computer programs, which can also be referred to as programs,software, software applications, applications, components, or code,include machine instructions for a programmable processor, and can beimplemented in a high-level procedural and/or object-orientedprogramming language, and/or in assembly/machine language. As usedherein, the term “machine-readable medium” refers to any computerprogram product, apparatus and/or device, such as for example magneticdiscs, optical disks, memory, and Programmable Logic Devices (PLDs),used to provide machine instructions and/or data to a programmableprocessor, including a machine-readable medium that receives machineinstructions as a machine-readable signal. The term “machine-readablesignal” refers to any signal used to provide machine instructions and/ordata to a programmable processor. The machine-readable medium can storesuch machine instructions non-transitorily, such as for example as woulda non-transient solid-state memory or a magnetic hard drive or anyequivalent storage medium. The machine-readable medium can alternativelyor additionally store such machine instructions in a transient manner,such as for example as would a processor cache or other random accessmemory associated with one or more physical processor cores.

To provide for interaction with a user, one or more aspects or featuresof the subject matter described herein can be implemented on a computerhaving a display device, such as for example a cathode ray tube (CRT), aliquid crystal display (LCD) or a light emitting diode (LED) monitor fordisplaying information to the user and a keyboard and a pointing device,such as for example a mouse or a trackball, by which the user mayprovide input to the computer. Other kinds of devices can be used toprovide for interaction with a user as well. For example, feedbackprovided to the user can be any form of sensory feedback, such as forexample visual feedback, auditory feedback, or tactile feedback; andinput from the user may be received in any form, including, but notlimited to, acoustic, speech, or tactile input. Other possible inputdevices include, but are not limited to, touch screens or othertouch-sensitive devices such as single or multi-point resistive orcapacitive trackpads, voice recognition hardware and software, opticalscanners, optical pointers, digital image capture devices and associatedinterpretation software, and the like.

The subject matter described herein can be embodied in systems,apparatus, methods, and/or articles depending on the desiredconfiguration. The implementations set forth in the foregoingdescription do not represent all implementations consistent with thesubject matter described herein. Instead, they are merely some examplesconsistent with aspects related to the described subject matter.Although a few variations have been described in detail above, othermodifications or additions are possible. In particular, further featuresand/or variations can be provided in addition to those set forth herein.For example, the implementations described above can be directed tovarious combinations and subcombinations of the disclosed featuresand/or combinations and subcombinations of several further featuresdisclosed above. In addition, the logic flows depicted in theaccompanying figures and/or described herein do not necessarily requirethe particular order shown, or sequential order, to achieve desirableresults. Other implementations may be within the scope of the followingclaims.

1-24. (canceled)
 25. A computer-implemented system for improvingpredictive capabilities of a machine learning system, the systemcomprising at least one processor and a machine-readable medium storinginstructions that, when executed by the at least one processor, causethe at least one processor to perform operations comprising: retrievingone or more profiles from the machine learning system's data store thatstores a plurality of profiles associated with a plurality of behaviorsorted lists, at least a first profile out of the plurality of profilesbeing associated with an input data record for a first entity from amonga plurality of entities; updating the first profile to recursivelycompute summary statistics of behavior of the first entity by adding anobserved behavior represented by the input data record to at least oneof a plurality of behavior sorted lists with a first weight; decayingweights of existing observed behaviors; comparing one or more entriesbetween at least two of the plurality of behavior sorted lists togenerate a numerical value representing a consistency between entries ina behavior sorted list for the first entity and the behavior sorted listof other entities; executing one or more distance models to theplurality of behavior sorted lists to determine a variation of theconsistency between the entries in the at least two behavior sortedlists according to the numerical value; and generating a threat scoreutilizing self-calibrating outlier models based on entity recursivelysummarized profile behavior and recurrences in an entities behaviorsorted list due to variation of matches on behavior sorted lists of riskentities.
 26. The system of claim 25, wherein at least a first profileis created for the first entity, the first profile being formed as adata structure that captures statistics of the first entity's behaviorwithout storing a record of past activity of the first entity.
 27. Thesystem of claim 26, wherein the first profile comprises a plurality ofbehavior sorted lists and recursive features.
 28. The system of claim27, wherein the behavior sorted lists are formed of tuples of entriesincluding a key, a weight, and a payload that represent afrequently-observed behavior of the first entity.
 29. The system ofclaim 27, wherein the recursive features are configured to summarize thefrequently-observed behavior of the first profile.
 30. The system ofclaim 26, wherein the data structure captures statistics of the firstentity's behavior without storing a record of past activity of the firstentity.
 31. The system of claim 25, wherein an alert is generated toidentify an activity as suspicious, in response to determining that theactivity is anomalous for that entity based on the first entity's priorhistory and peer group.
 32. The system of claim 25, wherein a payload ofat least one entry of the first profile includes recursive features. 33.The system of claim 25, wherein the payload of at least one entry of thefirst profile includes archetype distributions, derived archetypeprofile features, and soft clustering misalignment scores.
 34. Thesystem of claim 25, wherein the input data record is a transactionperformed by the first entity and based on determining a degradation ofa set of threat scores and using one or more auto-retraining mechanisms,the machine learning system is retrained.
 35. A computer-implementedmethod for improving predictive capabilities of a machine learningsystem, the method comprising: retrieving one or more profiles from themachine learning system's data store that stores a plurality of profilesassociated with a plurality of behavior sorted lists, at least a firstprofile out of the plurality of profiles being associated with an inputdata record for a first entity from among a plurality of entities;updating the first profile to recursively compute summary statistics ofbehavior of the first entity by adding an observed behavior representedby the input data record to at least one of a plurality of behaviorsorted lists with a first weight; decaying weights of existing observedbehaviors; comparing one or more entries between at least two of theplurality of behavior sorted lists to generate a numerical valuerepresenting a consistency between entries in a behavior sorted list forthe first entity and the behavior sorted list of other entities;executing one or more distance models to the plurality of behaviorsorted lists to determine a variation of the consistency between theentries in the at least two behavior sorted lists according to thenumerical value; and generating a threat score utilizingself-calibrating outlier models based on entity recursively summarizedprofile behavior and recurrences in an entities behavior sorted list dueto variation of matches on behavior sorted lists of risk entities. 36.The method of claim 35, wherein at least a first profile is created forthe first entity, the first profile being formed as a data structurethat captures statistics of the first entity's behavior without storinga record of past activity of the first entity.
 37. The method of claim36, wherein the first profile comprises a plurality of behavior sortedlists and recursive features.
 38. The method of claim 37, wherein thebehavior sorted lists are formed of tuples of entries including a key, aweight, and a payload that represent a frequently-observed behavior ofthe first entity.
 39. The method of claim 39, wherein the recursivefeatures are configured to summarize the frequently-observed behavior ofthe first profile.
 40. The method of claim 36, wherein the datastructure captures statistics of the first entity's behavior withoutstoring a record of past activity of the first entity.
 41. Anon-transitory computer program product storing instructions that, whenexecuted by at least one programmable processor, cause the at least oneprogrammable processor to perform operations comprising: retrieving oneor more profiles from the machine learning system's data store thatstores a plurality of profiles associated with a plurality of behaviorsorted lists, at least a first profile out of the plurality of profilesbeing associated with an input data record for a first entity from amonga plurality of entities; updating the first profile to recursivelycompute summary statistics of behavior of the first entity by adding anobserved behavior represented by the input data record to at least oneof a plurality of behavior sorted lists with a first weight; decayingweights of existing observed behaviors; comparing one or more entriesbetween at least two of the plurality of behavior sorted lists togenerate a numerical value representing a consistency between entries ina behavior sorted list for the first entity and the behavior sorted listof other entities; executing one or more distance models to theplurality of behavior sorted lists to determine a variation of theconsistency between the entries in the at least two behavior sortedlists according to the numerical value; and generating a threat scoreutilizing self-calibrating outlier models based on entity recursivelysummarized profile behavior and recurrences in an entities behaviorsorted list due to variation of matches on behavior sorted lists of riskentities.
 42. The computer program product of claim 41, wherein at leasta first profile is created for the first entity, the first profile beingformed as a data structure that captures statistics of the firstentity's behavior without storing a record of past activity of the firstentity and the first profile comprises a plurality of behavior sortedlists and recursive features.
 43. The computer program product of claim42, wherein the behavior sorted lists are formed of tuples of entriesincluding a key, a weight, and a payload that represent afrequently-observed behavior of the first entity and wherein therecursive features are configured to summarize the frequently-observedbehavior of the first profile.
 44. The computer program product of claim42, wherein the data structure captures statistics of the first entity'sbehavior without storing a record of past activity of the first entity.